Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi
نویسندگان
چکیده
Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi.
منابع مشابه
Data processing can mask biology: towards better reporting of fungal barcoding data?
Fungal barcoding, that is the use of genetic markers to identify fungal species, has contributed enormously to the rise of mycorrhizal research in the last decade (van der Heijden et al., 2015) because it allows quick and easy en masse identification of species or higher taxonomic ranks and grouping of sequences into entities; this speeds up ecological analyses and the discovery of new species ...
متن کاملPlutoF—a Web Based Workbench for Ecological and Taxonomic Research, with an Online Implementation for Fungal ITS Sequences
DNA sequences accumulating in the International Nucleotide Sequence Databases (INSD) form a rich source of information for taxonomic and ecological meta-analyses. However, these databases include many erroneous entries, and the data itself is poorly annotated with metadata, making it difficult to target and extract entries of interest with any degree of precision. Here we describe the web-based...
متن کاملNucleotide sequence of cDNA encoding for preprochymosin in native goat (Capra hircus) from Iran
Prochymosin is one of the most important aspartic proteinases used as a milk-clotting enzyme in cheese production. In the present investigation we report sequence of cDNA encoding goat ( Capra hircus ) preprochymosin and compare its nucleotide and deduced amino acid sequences with sequences of other ruminants preprochymosin. As bovine prochymosin, the caprine prochymosin cDNA encodes 365 amino ...
متن کاملProtein Sequence Annotation in the Genome Era: The Annotation Concept of SWISS-PROT + TREMBL
SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Ongoing genome sequencing projects have dramatically increased the number of protein sequences to be incorporated into SWISS-PROT. Since we do not want to dilute the quality standards of SWISS-PROT by incorporati...
متن کاملTowards a unified paradigm for sequence-based identification of Fungi Authors
The nuclear ribosomal internal transcribed spacer (ITS) region is the formal fungal barcode and in most cases the marker of choice for exploration of fungal diversity in environmental samples. Two problems are particularly acute in the pursuit of satisfactory taxonomic assignment of newly generated ITS sequences: (i) the lack of an inclusive, reliable public reference dataset, and (ii) the lack...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 6 شماره
صفحات -
تاریخ انتشار 2011